305 research outputs found

    BitSearch, the blog before the thesis

    Get PDF
    Postprint (published version

    UPC at MediaEval 2013 social event detection task

    Get PDF
    These working notes present the contribution of the UPC team to the Social Event Detection (SED) task in MediaEval 2013. The proposal extends the previous PhotoTOC work in the domain of shared collections of photographs stored in cloud services. An initial over-segmentation of the photo collection is later re ned by merging pairs of similar clus- ters.Postprint (published version

    Hyperparameter-free losses for model-based monocular reconstruction

    Get PDF
    This work proposes novel hyperparameter-free losses for single view 3D reconstruction with morphable models (3DMM). We dispense with the hyperparameters used in other works by exploiting geometry, so that the shape of the object and the camera pose are jointly optimized in a sole term expression. This simplification reduces the optimization time and its complexity. Moreover, we propose a novel implicit regularization technique based on random virtual projections that does not require additional 2D or 3D annotations. Our experiments suggest that minimizing a shape reprojection error together with the proposed implicit regularization is especially suitable for applications that require precise alignment between geometry and image spaces, such as augmented reality. We evaluate our losses on a large scale dataset with 3D ground truth and publish our implementations to facilitate reproducibility and public benchmarking in this field.Peer ReviewedPostprint (author's final draft

    Temporal activity detection in untrimmed videos with recurrent neural networks

    Get PDF
    This work proposes a simple pipeline to classify and temporally localize activities in untrimmed videos. Our system uses features from a 3D Convolutional Neural Network (C3D) as input to train a a recurrent neural network (RNN) that learns to classify video clips of 16 frames. After clip prediction, we post-process the output of the RNN to assign a single activity label to each video, and determine the temporal boundaries of the activity within the video. We show how our system can achieve competitive results in both tasks with a simple architecture. We evaluate our method in the ActivityNet Challenge 2016, achieving a 0.5874 mAP and a 0.2237 mAP in the classification and detection tasks, respectively. Our code and models are publicly available at: https://imatge-upc.github.io/activitynet-2016-cvprw/Peer ReviewedPostprint (published version

    Skin lesion classification from dermoscopic images using deep learning techniques

    Get PDF
    The recent emergence of deep learning methods for medical image analysis has enabled the development of intelligent medical imaging-based diagnosis systems that can assist the human expert in making better decisions about a patient’s health. In this paper we focus on the problem of skin lesion classification, particularly early melanoma detection, and present a deep-learning based approach to solve the problem of classifying a dermoscopic image containing a skin lesion as malignant or benign. The proposed solution is built around the VGGNet convolutional neural network architecture and uses the transfer learning paradigm. Experimental results are encouraging: on the ISIC Archive dataset, the proposed method achieves a sensitivity value of 78.66%, which is significantly higher than the current state of the art on that dataset.Postprint (author's final draft

    System architecture for indexing regions in keyframes

    Get PDF
    This paper describes the design of an indexing system for a video database. The system uses region-based manual an- notations of keyframes to create models to automatically annotate new keyframes also at the region level. The pre- sented architecture includes user interfaces for training and querying the system, internal databases to manage ingested content and modelled semantic classes, as well as communi- cation interfaces to allow the system interconnection. The scheme is designed to work as a plug-in to an external Mul- timedia Asset Management (MAM) system.Postprint (published version

    Assessing knee OA severity with CNN attention-based end-to-end architectures

    Get PDF
    This work proposes a novel end-to-end convolutional neural network (CNN) architecture to automatically quantify the severity of knee osteoarthritis (OA) using X-Ray images, which incorporates trainable attention modules acting as unsupervised fine-grained detectors of the region of interest (ROI). The proposed attention modules can be applied at different levels and scales across any CNN pipeline helping the network to learn relevant attention patterns over the most informative parts of the image at different resolutions. We test the proposed attention mechanism on existing state-of-the-art CNN architectures as our base models, achieving promising results on the benchmark knee OA datasets from the osteoarthritis initiative (OAI) and multicenter osteoarthritis study (MOST).Postprint (published version

    An interactive lifelog search engine for LSC2018

    Get PDF
    In this work, we describe an interactive lifelog search engine developed for the LSC 2018 search challenge at ACM ICMR 2018. The paper introduces the four-step process required to support lifelog search engines and describes the source data for the search engine as well as the approach to ranking chosen for the iterative search engine. Finally the interface used is introduced before we highlight the limits of the current prototype and suggest opportunities for future work.Peer ReviewedPostprint (published version

    Hierarchical object detection with deep reinforcement learning

    Get PDF
    We present a method for performing hierarchical object detection in images guided by a deep reinforcement learning agent. The key idea is to focus on those parts of the image that contain richer information and zoom on them. We train an intelligent agent that, given an image window, is capable of deciding where to focus the attention among five different predefined region candidates (smaller windows). This procedure is iterated providing a hierarchical image analysis. We compare two different candidate proposal strategies to guide the object search: with and without overlap. Moreover, our work compares two different strategies to extract features from a convolutional neural network for each region proposal: a first one that computes new feature maps for each region proposal, and a second one that computes the feature maps for the whole image to later generate crops for each region proposal. Experiments indicate better results for the overlapping candidate proposal strategy and a loss of performance for the cropped image features due to the loss of spatial resolution. We argue that, while this loss seems unavoidable when working with large amounts of object candidates, the much more reduced amount of region proposals generated by our reinforcement learning agent allows considering to extract features for each location without sharing convolutional computation among regions.Postprint (published version
    corecore